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PENALIZED MAXIMUM LIKELIHOOD ESTIMATION AND 
VARIABLE SELECTION IN GEOSTATISTICS 

(n: 

^ I By Tingjin Chu, Jun Zhv^ and Haonan Wang^ 

^ . Colorado State University, University of Wisconsin, Madison, 

' and Colorado State University 

pL^ , We consider the problem of selecting covariates in spatial lin- 

ear models with Gaussian process errors. Penalized maximum likeli- 
I hood estimation (PMLE) that enables simultaneous variable selection 

and parameter estimation is developed and, for ease of computation, 
PMLE is approximated by one-step sparse estimation (OSE). To fur- 
I ther improve computational efficiency, particularly with large sample 

sizes, we propose penalized maximum covariance-tapered likelihood 
estimation (PMLEt) and its one-step sparse estimation (OSEt). 
General forms of penalty functions with an emphasis on smoothly 
clipped absolute deviation are used for penalized maximum likeli- 
^ ' hood. Theoretical properties of PMLE and OSE, as well as their 

approximations PMLEt and OSEt using covariance tapering, are de- 
■ rived, including consistency, sparsity, asymptotic normality and the 

^ ' oracle properties. For covariance tapering, a by-product of our theo 

c I ^ ■ retical results is consistency and asymptotic normality of maximum 

(N- covariance-tapered likelihood estimates. Finite-sample properties of 

I the proposed methods are demonstrated in a simulation study and, 

. for illustration, the methods are applied to analyze two real data sets. 

o: 



1. Introduction. Geostatistical models are popular tools for the analysis 
of spatial data in many disciplines. It is often of interest to estimate model 
parameters based on data at sampled locations and perform spatial interpo- 
lation (also known as Kriging) of a response variable at unsampled locations 
^ . within a spatial domain of interest [2, 16, 17]. In addition, a practical issue 

■ that often arises is how to select the best model or a best subset of models 
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among many competing ones [10]. Here we focus on selecting covariates in 
a spatial linear model, which we believe is a problem that is underdeveloped 
in both theory and methodology despite its importance in geostatistics. The 
spatial linear model for a response variable under consideration has two ad- 
ditive components: a fixed linear regression term and a stochastic error term. 
We assume that the error term follows a Gaussian process with mean zero 
and a covariance function that accounts for spatial dependence. Our chief 
objective is to develop a set of new methods for the selection of covariates 
and establish their asymptotic properties. Moreover, we devise efficient algo- 
rithms for computation, making these methods feasible for practical usage. 

For linear regression with independent errors, variable selection has been 
widely studied in the literature. The more traditional methods often involve 
hypothesis testing such as F-tests in a stepwise selection procedure [3]. An 
alternative approach is to select models using information discrepancy such 
as a Kolmogorov-Smirnov, Kullback-Leibler or Hellinger discrepancy [13]. 
In recent years, penalized methods are becoming increasingly popular for 
variable selection. For example, Tibshirani [18] developed a least absolute 
shrinkage and selection operator (LASSO), whereas Fan and Li [7] pro- 
posed a nonconcave penalized likelihood method with a smoothly clipped 
absolute deviation (SCAD) penalty. Efron et al. [5] devised least angle re- 
gression (LARS) algorithms, which allow computing all LASSO estimates 
along a path of its tuning parameters at a low computational order. More 
recently, Zou [23] improved LASSO and the resulting adaptive LASSO en- 
joys the oracle properties as SCAD, in terms of selecting the true model. Zou 
and Li [24] proposed one-step sparse estimation in the nonconcave penalized 
likelihood approach, which retains the oracle properties and utilizes LARS 
algorithms. 

For spatial linear models in geostatistics, in contrast, statistical methods 
for a principled selection of covariates are limited. Hoeting et al. [10] sug- 
gested Akaike's information criterion (AIC) with a finite-sample correction 
for variable selection. Like information-based selection in general, compu- 
tation can be costly especially when the number of covariates and/or the 
sample sizes are large. Thus, these authors considered only a subset of the co- 
variates that may be related to the abundance of the orange-throated whip- 
tail lizard in southern California, in order to make it tractable to evaluate 
their AlC-based model selection. Huang and Chen [11] developed a model 
selection criterion in geostatistics, but for the purpose of Kriging rather 
than selection of covariates. Further, Wang and Zhu [21] proposed penalized 
least squares (PLS) for a spatial linear model where the error process is 
assumed to be strong mixing without the assumption of a Gaussian process. 
This method includes spatial autocorrelation only indirectly in the sense 
that the objective function involves a sum of squared errors ignoring spatial 
dependence. A spatial block bootstrap is then used to account for spatial 
dependence when estimating the variance of PLS estimates. 
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Here we take an alternative, parametric approach and assume that the 
errors in the spatial linear model follow a Gaussian process. Our main in- 
novation here is to incorporate spatial dependence directly into a penalized 
likelihood function and achieve greater efficiency in the resulting penalized 
maximum likelihood estimates (PMLE). Unlike computation of PLS esti- 
mates which is on the same order as ordinary least squares estimates, how- 
ever, penalized likelihood function for a spatial linear model will involve 
operations of a covariance matrix of the same size as the number of observa- 
tions. Thus, the computational cost can be prohibitively high as the sample 
size becomes large. It is essential that our new methods address this issue. To 
that end, we utilize one-step sparse estimation (OSE) and LARS algorithms 
in the computation of PMLE to gain computational efficiency. In addition, 
we explore covariance tapering, which further reduces computational cost 
by replacing the exact covariance matrix with a sparse one [4, 9, 12]. We 
establish the asymptotic properties of both PMLE and OSE, as well as their 
covariance-tapered counterparts. As a by-product, we establish new results 
for covariance-tapered MLE which, to the best of our knowledge, have not 
been established before and can be of independent interest. 

The remainder of the paper is organized as follows. In Section 2 we develop 
penalized maximum covariance-tapered likelihood estimation (PMLEt) that 
enables simultaneous variable selection and parameter estimation, as well as 
an approximation of the PMLEt by one-step sparse estimation (OSEt) to 
enhance computational efficiency. PMLE and OSE are regarded as a special 
case of PMLEt and OSEt- We establish asymptotic properties of PMLE 
and OSE in Section 3 and those of PMLEt and OSEt under covariance 
tapering in Section 4. In Section 5 finite-sample properties of the proposed 
methods are investigated in a simulation study and, for illustration, the 
methods are applied to analyze two real data sets. We outline the technical 
proofs in Appendices A.l and A. 2. 

2. Maximum likelihood estimation: Penalization and covariance tapering. 

2.1. Spatial linear model and maximum likelihood estimation. For a spa- 
tial domain of interest R in M'', we consider a spatial process {y(s) : s G i?} 
such that 



where x(s) = (xi(s), . . . ,Xp{s))'^ is a p x 1 vector of covariates at location s 
and P = (/?!, . . . ,f^p)'^ is a p X 1 vector of regression coefficients. We assume 
that the error process {e(s) : s G R} is a Gaussian process with mean zero 
and a covariance function 



where s,s' G i? and is a q x 1 vector of covariance function parameters. 





(2.2) 
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Let si, . . . jSat denote N sampling sites in R. Let y = {y{si), . . . ,y{s]\[))'^ 
denote an iV x 1 vector of response variables and Xj = {xj{si), . . . ,Xj{sj\[))'^ 
denote an x 1 vector of the jth covariate with j = 1, . . . ,p, at the N 
sampling sites. Further, let X = [xi, . . . ,Xp] denote an x p design matrix 
of covariates and T = [7(sj,Sj/;0)]^,^-^ denote an x covariance matrix. 
In this paper, we consider general forms for the the covariance matrix T 
and describe suitable regularity conditions in Sections 3 and 4. By (2.1) 
and (2.2), we have 

(2.3) y^N{X(3,T). 

Let ri = {0^ , O'^)'^ denote a (p + g) x 1 vector of model parameters consist- 
ing of both regression coefficients (3 and covariance function parameters 6. 
By (2.3), the log-likelihood function of ij is 

%;y,X) = -(Ar/2)log(27r) - (l/2)log|r| 

(2.4) 

-(l/2)(y-X/3)^r-i(y-X/3). 

Let ?)mle — 3'i^g™3,x^{^(77; y, X)} denote the maximum likelihood estimate 
(MLE) of rj. 

2.2. Covariance tapering and penalized maximum likelihood. It is well 
known that computation of MLE for a spatial linear model is of order A^^ 
and can be very demanding when the sample size A'^ increases [2]. There are 
various approaches to alleviating the computational cost. Here we consider 
covariance tapering, which could effectively reduce our computational cost in 
practice. Furrer et al. [9] considered tapering for Kriging and demonstrated 
that not only tapering enhances computational efficiency but also achieves 
asymptotically optimality in terms of mean squared prediction errors under 
infill asymptotics. For parameter estimation via maximum likelihood, Kauf- 
man et al. [12] established consistency of tapered MLE, whereas Du et al. [4] 
established the asymptotic distribution, also under infill asymptotics. How- 
ever, both Kaufman et al. [12] and Du et al. [4] focused on the parameters in 
the Matern family of covariance functions and did not consider estimation of 
the regression coefficients. In contrast, our primary interest is in the estima- 
tion of regression coefficients and we investigate the asymptotic properties 
under increasing domain asymptotics, which, to the best of our knowledge, 
have not been established in the literature before. 

Recall that F = [7(si, Sj/)]^,^-,^ is the covariance matrix of y. Assum- 
ing second-order stationarity and isotropy, we let j{d) =7(s,s'), where 
(i = |js — s'll is the lag distance between two sampling sites s and s' in R. 
Let KT{d,Lj) denote a tapering function, which is an isotropic autocorre- 
lation function when < d < u and when d> ui, for a given threshold 
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distance u > 0. Compactly supported correlation functions can be used as 
the tapering functions [22]. For example, 

(2.5) KT{d,uj) = {l-d/u})+, 

where x+ = max{a;,0}, in which case the correlation is at lag distance 
greater than the threshold distance to. Let A(a;) = [KTidu' ,co)]f-,^^ denote 
an X tapering matrix. Then a tapered covariance matrix of T is defined 
as Ft = Fo A(a;), where o denotes the Hadamard product (i.e., elementwise 
product). 

We approximate the log-likelihood function by replacing F in (2.4) with 
the tapered covariance matrix Ft and obtain a covariance-tapered log- 
likelihood function 

^T(^;y,X) = -(iV/2)log(27r) - (l/2)log|FT| 

(2.6) 

-il/2)iy-X/3fT^Hy-Xf3). 

We let ??mlEt = ^^S^^^vi^'^i'T^y^-^)} denote the maximum covariance- 
tapered likelihood estimate (MLEt) of rj. 

Let Ffc,T = dTT/d9k = Tk o A{u), F^ = dT^^/d9k = F^' o A(a;), Tkk',T = 
d'^TT/dOk dOk' = Tkk' o A{uj), and Ff' = a^F^V^^fc dOk' = V"^' o /^{uj) de- 
note the covariance-tapered version of F^, F^, Tkk' and F'^*^ , respectively. 
From (2.6), ^^{p) = X'^F-^(y - X/3) and the fcth element of (.'^{6) is 
-(l/2)tr(FT^Ffc,T) - (l/2)(y - X/3)^F^(y - X/3). Moreover, 4(/3,/3) = 
-X^F^^X, the kih. column of 4(/3, 6) is X^F^(y - X/3), and the {k, k')t\i 

entry of 4(0,0) is -{l/2){tv{T^''Tkk' ,T + T^TTk> + (y - X/3)^F^'='(y - 
X/3)}. Since E{—£'^((3, 9)} = 0, the covariance-tapered information ma- 
trix of ri is It(?7) = diag{lT(/3), It(0)}, where It(/3) = -E{-4(/3,/3)} = 
X^F^^X and the {k,k')th entry of 1^(6) = £{-£'^{.{0,6)} is ifcfc',T/2 with 

tkk',T = tr(F;^^Ffc^Tr^^Ffc/^T) = tr(FTF^ x FtF^). 

Now, we define a covariance-tapered penalized log-likelihood function as 

p 

(2.7) QT(r?) =£T(r7;y,X) - nY^Px^)^ 

i=i 

where iTiV'^y^^) is a covariance-tapered log-likelihood function as defined 
in (2.6). Moreover, we let ^?pmlEt = argmax^{QT('7)} denote the penalized 
maximum covariance-tapered likelihood estimate (PMLEt) of rj. 

For penalty functions, we mainly consider smoothly clipped absolute de- 
viation (SCAD) defined as 

if|/3|<A, 

A2 + (a-l)-i(aA|/3|-/3V2-aA2 + AV2), 
.(a + l)AV2, if|/3|>aA, 
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for some a > 2 [6]. For i.i.d. error in standard linear regression, variable 
selection and parameter estimation under the SCAD penalty are shown 
to possess three desirable properties: unbiasedness, sparsity and continu- 
ity [7]. For spatial linear regression (2.1), these properties continue to hold 
for the SCAD penalty following arguments similar to those in Wang and 
Zhu [21]. 

To compute PMLEt under the SCAD penalty, Fan and Li [7] proposed 
a locally quadratic approximation (LQA) of the penalty function and a New- 
ton-Raphson algorithm. Although fast, a drawback of the LQA algorithm 
is that once a regression coefficient is shrunk to zero, it remains to be zero 
in the remainder iterations. More recently, Zou and Li [24] developed a uni- 
fied algorithm to improve computational efficiency, which, unlike the LQA 
algorithm, is based on the locally linear approximation (LLA) of the penalty 
function. Moreover, Zou and Li [24] proposed one-step LLA estimation that 
approximates the solution after just one iteration in a Newton-Raphson-type 
algorithm starting at the MLE. We extend this one-step LLA estimation to 
approximate PMLEt for the spatial linear model as follows. 

Algorithm 1. At the initialization step, we let rj'^^ = ?7mlEt "^ith 
fi!^^ = /3mlEt ^^"-^ ^ ~ ^mlEt • We then update (3 by maximizing 

(2.9) Q^(/3) = -(l/2)(y-X/3)^rT(<V^(y-X/3)-ivX:p',(|/3(¥l)|/3,-| 

with respect to (3, where the first term is from (2.6) and the second term 
is an LLA of the penalty function in (2.7). The resulting one-step sparse 
estimate (OSE) of /3 is denoted as /SqsEt- We may also update 6 by max- 
imizing (2.6) with respect to 9 given /SqsEt- '^^^ resulting OSE of 6 is 

denoted as ^osEt- We let /JosEt = (/^OSEt' ^OSEt)^ denote the OSEt of rj, 
which approximates ^)pmlEt- 

It is worth mentioning an alternative covariance-tapered log-likelihood 
function [12], 

^T2(^;y,X) = -(Af/2)log(27r) - (l/2)log|rT| 

(2.10) 

- (l/2)(y - XpfiT^' o A(a;)}(y - X/3). 

If the alternative covariance tapering is used in Algorithm 1, the resulting 
estimates of parameters, especially the range parameter, tend to be more 
accurate, but require more time to compute o A(a;) than F^^. For 
a numerical comparison, see Section 6.1 in Chu et al. [1]. 

Finally, two tuning parameters, A and a, in the SCAD penalty (2.8) need 
to be estimated. For computational ease, we fix a = 3.7 as recommended by 
Fan and Li [7]. To determine A, we use the Bayesian information criterion 
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(BIC); see Wang et al. [20]. In particular, let 

(2.11) a^x) = iV"Hy - x3(A)}^r{0(A)}-Hy - x3(A)}, 

where /3(A) and 0{X) are the PMLE obtained for a given A, and let 

(2.12) BIC(A) = iVlog{a2(A)} + A;(A) log(iV), 

where k{X) is the number of nonzero regression coefficients [19]. Thus, an 
estimate of A is A = argmin;^{BIC(A)}. 

When A(a;) is a matrix of I's, Ft = T and ^t(') = ^(O- Similarly, we 
henceforth obtain needed counterparts of the notation in this section under 
maximum hkehhood without covariance tapering by omitting T. For details 
regarding such notation, see Section 2 of Chu et al. [1]. 

3. Asymptotic properties of PMLE and OSE. 

3.1. Notation and assumptions. We let /3o = (/3i05 • • • j/^po)"^ = (/3iO'/^2o)'^ 
denote the true regression coefficients, where without loss of generality /3]^o is 
an s X 1 vector of nonzero regression coefficients and f320 = is a (p — s) x 1 
zero vector. Let Oq denote the vector of true covariance function parameters. 

We consider the asymptotic framework in Mardia and Marshall [14] and 
let n denote the stage of the asymptotics. In particular, write Rn = R, Nn = 
N, and A„ = A. Furthermore, define a„ = maxi<j<p{|p';^ (|/3jo|)h /3jo 7^ 0} 
and 6„ = maxi<,<p{jp'j;j|/3,o|)|:/3io/0}. Also, let 0(/3) =Va(I/3i|) sgn(/3i), 
. . . ,p'^{\^p\)sgn{(3p)f and *(/3) = dis.g{p'im), . . . ,p'iW)}. Moreover, de- 
note cj>niP) = <^(/3) *n(/3) = *(/3), both evaluated at A„. For all other 
quantities that depend on n, the stage n will be in either the left superscript 
or the right subscript. 

Recall that "t^fc' = tr("r-^"rfc"r-^"rfc/). Let m < ■ ■ ■ < hn„ denote the 
eigenvalues of "F. For 1 = 1,... ,Nn, let i^tf denote the eigenvalues of "F^, 
such that l^il < ••• < IMat^I l^t /xf''' denote the eigenvalues of "Ffefc/ 
such that l^^'^'l < • • • < Ima^JI- 

For an Nn x Nn matrix A = {aij)^^^^, the Frobenius, max and spectral 
norm are defined as ||A||i7' = Zlj^'i ^f^)^''^! || A||max = max{|ajj| :z, j = 

1, . . . ,Nn} and ||A||s = max{|;U/(A)| ■.1 = 1,... ,Nn}, where fJ-i{A) is the Ith 
eigenvalue of A. 

The following regularity conditions are assumed for Theorems 3.1 and 3.2: 

(A.l) For 9 G^l where is an open subset of M'^ such that ri eMP x Q, 
the covariance function ^{■,-;6) is twice differentiable with respect to 9 with 
continuous second-order derivatives and is positive definite in the sense that, 
for any Nn > 1 and si,...,SAr^, the covariance matrix r = [y{si,Sj;9)]^j^-^ 
is positive definite. 
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(A. 2) There exist positive constants C, Ck and Ckk', such that 
Um„_^oo /U7V„ = C < oo, hm„_^oo {(J-nJ = C'fc < oo, hm„_^oo I^^Jl = Ckk' < oo 
for all k,k' = 1, . . . ,q. 

(A. 3) For some 5 > 0, there exist positive constants Dfc, -Dfcfc' and -D^^./ 

such that (i) ||"rfc||~2 = D^iV"^/^"'^ for k = l,...,q; (ii) either 11^^ + 
"rfcHI:^' = Dkk'Nn^/^-^ or II^Ffc - "rfc^ll:^' = Dlf^,Nn^/^~^ for any k + k' . 

(A.4) For any k,k' = I, . . . ,q, (i) "a^fc' = lim„^oorifcfc'("ifcfc"*fc'fc')"^^^} 
exists and A.„ = ("-akk'Yk^y^i nonsingular; (ii) |"tfcfc"ifc'fe'~^| and 

|"*fc'fc'"^fcfc~"^| are bounded. 

(A. 5) The design matrix X has full rank p and is uniformly bounded in 
max norm with lim„_>.oo(X"^X)^^ = 0. 

(A. 6) There exists a positive constant Co, such that ll^r"^!!^ <Cq < oo. 

(A.7) For /3 G and G 0, N~Hn{(3) J(/9) and iV~^In(^) ^ J(0) 
as n ^ oo. 

— 1/2 

(A.8) an = 0{Nn ' ) and 6„ ^ as n — > oo. 

(A. 9) There exist positive constants ci and C2 such that, when /32 > 
ciA„, \p'l{/3,)-p'li/3,)\<C2\/3i-/32\. 

1/2 

(A. 10) A„^0,iV„'^A„ — 7> oo as n — )■ oo. 
(A.ll) liminf.„^ooliminf^_^o+ K^p'x„{f^) > 0- 

Conditions (A. 2), (A.3)(i), (A.4)(i) and (A. 5) are assumed in Mardia and 
Marshall [14]. Conditions (A.l) and (A. 5) are standard assumptions for 
MLE, whereas (A. 2), (A.3)(i), (A.4)(i) and (A. 6) ensure smoothness, growth 
and convergence of the information matrix [14]. Together with (A.7), they 
yield a central limit theorem oi i'{r]) and convergence in probability of (-"{ti). 
For establishing Theorems 3.1 and 3.2, only the parts (i) of (A. 3) and (A.4) 
are used. Moreover, the implicit asymptotic framework is increasing the 
domain, where the sample size Nn grows at the increase of the spatial do- 
main Rn [14]. Finally, (A.8)-(A.ll) are mild regularity conditions regarding 
the penalty function and are sufficient for Theorems 3.1 and 3.2 to hold [7] 
and [8]. 

3.2. Consistency and asymptotic normality of PMLE. 

Theorem 3.1. Under (A.1)-(A.9), there exists, with probability tending 
to one, a local maximizer ""T] ofQ{r]) such that \\"'T]~'no\\ = Op{Nn +a„). 
//, in addition, (A.lO)-(A.ll) hold, i/ien "77 = ("3f, ,"0^)^ satisfies: 

(i) Sparsity: '"/32 = with probability tending to 1. 

(ii) Asymptotic normality: 

A^y'{J(/3io) + *n(/3io)}["3i - /3io + {J(/3io) + *„(/3io)}-V„(/3io)] 
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^iV(0,J(/3io)), 

where J{(3iq) and ^niPio) consist of the first s x s upper-left suhmatrix of 
J(/3o) and $„(/3o), respectively. 

Theorem 3.1 establishes the asymptotic properties of PMLE. Under (A.l)- 
(A.9), there exists a local maximizer converging to the true parameter at the 

~Y l'2 — 1/2 

rate Op{Nn +an)- Since a„ = 0{Nn ) from (A. 8), the local maximizer 
is root-iV„ consistent. As shown in Fan and Li [7], the SCAD penalty func- 
tion satisfies (A.8)-(A.ll) by choosing an appropriate tuning parameter A„. 
Therefore, by Theorem 3.1, the PMLE under the SCAD penalty possesses 
the sparsity property and asymptotic normality. Moreover, when the sample 
size Nn is sufficiently large, ^n{Pio) will be close to zero. That is, perfor- 
mance of the PMLE is asymptotically as efficient as the MLE of (3i when 
knowing = 0. The arguments above hold for other penalty functions such 
as Lq penalty with < 1, but not (? = 1. 

3.3. Consistency and asymptotic normality of OSE. 

Theorem 3.2. Suppose that the initial value "t^C^) satisfies "^rj^^^ —Vo — 
Op{N~^^^). For the SCAD penalty, under (A.1)-(A.7) and (A. 10), the OSE 
"^OSE = r3?^osE'"3i^osE>"^osE)^ satisfies: 

(i) Sparsity: '"/32 qse — with probability tending to 1. 

(ii) Asymptotic normality: 

A^y'r3i,osE - /3io) A iV(0, J(/3io)-^), 

iV^'r^osE - e,) A iV(0, J(0o)-'), 
where J(/9]^o) consists of the first s x s upper-left submatrix of J{f3Q). 

Theorem 3.2 establishes the asymptotic properties of OSE such that the 
OSE is sparse and asymptotically normal under the SCAD penalty. The 
OSE for (3i and 6 has the same limiting distribution as the PMLE and thus 
achieves the same efficiency. In fact. Theorem 3.2 holds for another general 
class of penalty functions such that p'x^{-) = ^nP{-) where p'{-) is continuous 
on (0, do), and there is some a > such that = 0{f3~'^) as /3 — )■ 0-|- [24]. 
Following similar arguments for the SCAD penalty in our Theorem 3.2 and 

those in Zou and Li [24], it can be shown that, if Nn^°'^^'^ Xn — )> oo and 

1/2 

Nn A„ 0, Theorem 3.2 continues to hold. In practice, we set the initial 
value "■■q^'^^ to be the MLE "^yMLE; ^ satisfies the consistency condition. 
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4. Asymptotic properties under covariance tapering. 

4.1. Notation and assumptions. In order to establish the asymptotic 
properties under covariance tapering, we continue to assume (A.l)-(A.ll). 
We now restrict our attention to a second-order stationary error process 
in with an isotropic covariance function 7((i), where d > is the lag dis- 
tance. We also assume that the distance between any two sampling sites is 
greater than a constant [14]. As for the tapering function, we consider (2.5). 

Let 7fc(d) = d-f{d)/d0k, lkk'{d) = d^^{d)/dek dOy, for k,k' = I, . . . ,q. Two 
additional regularity conditions are assumed for Theorems 4.2 and 4.3: 

— 1/2 — 1/2 

(A. 12) < mfn{ujnNn } < sup,„{a;„A'^„ } < oo, where a;„ = a; is the 
threshold distance in the tapering function (2.5). 

(A. 13) There exists a nonincreasing function 70 with n^7o(u) du < 00 
such that max{|7(u)|, |7fc(u)|, |7fc,fc'(^)|} ^ 7o(^^) for all u G (0,00) and l<k, 
k' <q. 

From (A. 12), the threshold distance uin is bounded away from and grows 
1/2 

at the rate of Nn ■ The condition in (A. 13) has to do with the covariance 
function. It can be shown that they hold for some of the commonly-used co- 
variance functions such as the Matern class. Details are given in Appendix D 
of Chu et al. [1]. 

4.2. Consistency and asymptotic normality of PMLEj^. 

Proposition 4.1. Under (A.1)-(A.7) and (A.12)-(A.13), the MLEt 
""^mlEt asymptotically normal with 

Proposition 4.1 establishes the asymptotic normality of MLEt- In par- 
ticular, MLE and MLEt have the same limiting distribution. This implies 
that, under the regularity conditions, covariance-tapered MLE achieves the 
same efficiency as MLE. Thus, in Algorithm 1 for computing the OSEt, we 
may set the initial parameter values to "t7mlEt- 

Theorem 4.2. Under (A.1)-(A.9) and (A.12)-(A.13), there exists, with 
probability tending to one, a local maximizer '"'rj'Y of Q^ir]) defined in (2.7) 

^ —1/2 

such that W^rj^Y — tjqW = Op{Nn +On)- Ifj i'^ addition, (A.lO)-(A.ll) hold, 
then "r/T = ("3^T'"3i^T>"^T)^ satisfies: 

(i) Sparsity: ^(32 x = with probability tending to 1. 
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(ii) Asymptotic normality: 
K^HJiPlo) + *n(/3io)}r3l,T - PlO + {J(/3lo) + *n(/3io)}- Vn(/3lo)] 

AiV(0,J(/3io)), 

ivV2(«gT-0o) AiV(O,J(0o)-'), 

where J(/3]^o) ^^'^ ^"(/^lo) consist of the first s x s upper-left suhmatrix of 
J(/3o) and $„(/3q), respectively. 

In Theorem 4.2, PMLEt is shown to be consistent, sparse and asymptot- 
ically normal. In particular, PMLEt has the same asymptotic distribution 
as PMLE in Theorem 3.1. That is, PMLEt achieves the same efficiency 
and oracle property as PMLE asymptotically, yet in the mean time is more 
computationally efficient. 

4.3. Consistency and asymptotic normality of OSE'y- 

Theorem 4.3. Suppose that the initial value "r/^^ in Algorithm 1 satis- 
fies "rjfp^ ~ ''70 = Op{Nn ^^^). For the SCAD penalty function, under (A.l)- 
(A.7), (A.IO) and (A.12)-(A.13), the OSEt ""Vose^ = {''dlosE-r^''M,osE-r' 
"^OSEt)^ satisfies: 

(i) Sparsity: "/32 osEt ~ ^ '"'^^^ probability tending to 1. 

(ii) Asymptotic normality: 

iVy'r3i,osET -/3io) ^A^(0,J(/3io)~'), 

A^y'r^osET - Oo) A N{0,3{eor'), 
where J(/9io) consists of the first sx s upper-left submatrix o/ J(/3o). 

Theorem 4.3 establishes the asymptotic properties of OSEt under the 
SCAD penalty. In particular, OSEt achieves the same limiting distribution 
as OSE of (3i and in Theorem 3.2 and thus the same efficiency. Further- 
more, similar to Theorem 3.2, Theorem 4.3 holds for the class of penalty 
functions such that v'x^i ) = ^nP{-) where p'{-) is continuous on (0, oo), and 
there is some a > such that p'{f3) = 0{f3"°^) as f3 ^ 0+, provided that 

5. Numerical examples. 

5.1. Simulation study. We now conduct a simulation study to investi- 
gate the finite-sample properties of OSE and OSEt. The spatial domain of 
interest is assumed to be a square [0, Z]^ of side lengths / = 5, 10, 15. The sam- 
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pie sizes are set to be = 100,400,900 for / = 5,10,15, respectively, with 
a fixed sampling density of 4. For regression, we generate seven covariates 
that follow standard normal distributions with a cross-covariate correlation 
of 0.5. The regression coefficients are set to be /3 = (4,3,2,1,0,0,0)^. We 
standardize the covariates to have mean and variance 1 and standardize y 
to have mean 0. Thus, there will be no intercept in the vector of regression 
coefficients (3. For spatial dependence, the error terms follow an exponential 
covariance function 7((i) = o"^(l — c) exp(— d/r), where o"^ = 9 is the variance, 
c = 0.2 is the nugget effect and r = 1 is the range parameter. For each choice 
of sample size N , a total of 100 data sets are simulated. 

For each simulated data set, we compute OSE and OSEt using Algo- 
rithm 1. For OSEt, we consider different threshold values for covariance ta- 
pering ui = 1/2^ for k = 1,2, We present only the case of u: = 1/4: to save 

space. Our methods are compared against several alternatives. Of particular 
interest is OSE under a standard linear regression where spatial autocorre- 
lation is unaccounted for in the penalized loglikelihood function. This would 
be akin to PLS under SCAD in Wang and Zhu [21] and will be referred to 
as OSEAiti- In addition, we modify the initialization step of Algorithm 1 
by using MLE under the true model which is unknown but assumed to be 
known. This is an attempt to evaluate the effect of starting values and will 
be referred to as 0SEAit2- Last, we consider a benchmark case, referred to as 
0SEAit3j where the true model is assumed to be known and the MLE of the 
nonzero regression coefficients and the covariance function parameters are 
computed. Our OSE and OSEt will be compared against this benchmark 
to evaluate the oracle properties. 

For each choice of sample size A^, we first compute the average numbers 
of correctly (CO) and incorrectly (10) identified zero-valued regression coef- 
ficients from OSE /Sqse ^'^d OSEt /3osEt' ^ ^^^^ those from OSEaiii 
and 0SEAit2- The true number of zero- valued regression coefficients is 3 
as is assumed in OSEAits^^Then, we compute means of the nonzero- valued 
OSE /3i osE ^i^d OSEt /3i,osEt' ^^^^ corresponding covariance 

function parameters ^OSE and ^osEt- We estimate standard deviations 
(SDs) of the parameter estimates using the information matrix. The true 
SD is approximated by the median of the sample SD (SDm) of the 100 
parameter estimates. The results are given in Tables 1-3. 

In terms of variable selection, CO tends to the true value 3 and 10 tends 
to 0, as the sample size A^ increases, for OSE, OSEt, OSEaui and 0SEAit2- 
When the sample size is relatively small (A^ = 100), 0SEAit2 has the best 
performance with the largest CO and smallest 10, reflecting the effect of 
starting values in Algorithm 1. But it is not practical, as we do not know 
what the true model is in actual data analysis. OSEaui assuming no spatial 
dependence in the regression model seems to over-shrink the regression coef- 
ficients. While CO = 2.84 is close to 3 under OSEaiii, 10 = 0.32 is also large. 
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Table 1 

The average number of correctly identified coefficients (CO), average number of 
incorrectly identified coefficients (10), mean, standard deviation (SD) and median 
estimated standard deviation (SDm) under OSE, OSEt, OSEahi, 0SEAit2 and OSEahs 

for sample size N = 100 



iVltJlIlOtl 


XI Ulll 




wo JjjX 


uaniAiti 


»jarjAit2 


uaniAits 


CO 


3 


2.79 


2.84 


2.84 


2.95 


3.00 


10 




0.06 


0.10 


0.32 


0.06 


0.00 


/3i 


4.00 


4.01 


4.03 


4.17 


4.01 


4.01 


SD 




0.28 


0.29 


0.39 


0.27 


0.27 


SDm 




0.26 


0.27 


0.36 


0.26 


0.26 


02 


3.00 


3.04 


3.03 


3.08 


3.04 


3.03 


SD 




0.30 


0.30 


0.41 


0.30 


0.29 


SDm 




0.25 


0.26 


0.36 


0.25 


0.25 




2.00 


1.94 


1.97 


2.00 


1.94 


1.93 


SD 




0.29 


0.31 


0.50 


0.28 


0.28 


SDm 




0.25 


0.26 


0.36 


0.26 


0.26 




1.00 


1.02 


1.03 


0.78 


1.03 


1.02 


SD 




0.35 


0.40 


0.55 


0.33 


0.26 


SDm 




0.24 


0.24 


0.26 


0.24 


0.26 


r 


1.00 


0.79 


6.31 




0.83 


0.84 


SD 




0.54 


2.14 




0.57 


0.57 


SDm 




0.48 


17.65 




0.51 


0.51 


c 


0.20 


0.16 


0.23 




0.17 


0.17 


SD 




0.12 


0.13 




0.12 


0.12 


SDm 




0.11 


0.19 




0.11 


0.11 




9.00 


7.96 


7.14 


7.74 


8.03 


8.03 


SD 




2.28 


1.53 


2.06 


2.36 


2.36 


SDm 




2.21 


4.79 


1.16 


2.28 


2.28 



compared to our OSE and OSEt- Between OSE and OSEx, it appears that 
CO is slightly better, but 10 is slightly worse for OSEt than OSE. 

In terms of estimation of the nonzero regression coefficients, both accu- 
racy and precision improve as the sample size N increases, for all five OSE 
cases considered here. While the accuracy is similar between OSE^iti and 
our OSE and OSEt, a striking feature is the larger SD of OSEaiii when com- 
pared with our OSE and OSEt, for all three sample sizes = 100, 400, 900. 
This suggests that, by including spatial dependence directly in the penalized 
likelihood function, we gain statistical efficiency in parameter estimation. 
For the small sample size (A^ = 100), SD based on the information matrix 
without accounting for spatial dependence appears to underestimate the 
true variation estimated by SDm. Furthermore, the SD's of OSE and OSEt 
tend to those in the benchmark case OSEaiis as the sample size increases. 
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Table 2 

The average number of correctly identified coefficients (CO), average number of 
incorrectly identified coefficients (10), mean, standard deviation (SD) and median 
estimated standard deviation (SDm) under OSE, OSEt, OSEahi, OSE Ait2 and OSEahs 

for sample size N = 400 



iVltJlIlOtl 


XI Ulll 




wo JjjX 


uaniAiti 


»jajiiAit2 


nciF A „ 
uaniAits 


CO 


3 


2.97 


2.97 


2.97 


2.98 


3.00 


10 




0.00 


0.00 


0.01 


0.00 


0.00 


/3i 


4.00 


3.98 


3.98 


3.98 


3.99 


3.99 


SD 




0.14 


0.14 


0.20 


0.14 


0.14 


SDm 




0.13 


0.13 


0.19 


0.13 


0.13 


02 


3.00 


3.02 


3.03 


3.03 


3.02 


3.02 


SD 




0.14 


0.14 


0.21 


0.13 


0.13 


SDm 




0.13 


0.13 


0.19 


0.13 


0.13 




2.00 


2.01 


2.01 


2.01 


2.01 


2.01 


SD 




0.12 


0.12 


0.17 


0.12 


0.12 


SDm 




0.13 


0.13 


0.19 


0.13 


0.13 




1.00 


0.99 


1.00 


0.96 


1.00 


1.00 


SD 




0.12 


0.12 


0.26 


0.12 


0.12 


SDm 




0.13 


0.13 


0.19 


0.13 


0.13 


r 


1.00 


0.90 


2.87 




0.90 


0.90 


SD 




0.29 


4.08 




0.29 


0.29 


SDm 




0.25 


5.24 




0.25 


0.25 


c 


0.20 


0.19 


0.29 




0.19 


0.19 


SD 




0.06 


0.07 




0.06 


0.06 


SDm 




0.05 


0.11 




0.05 


0.05 




9.00 


8.70 


8.25 


8.71 


8.70 


8.70 


SD 




1.39 


1.00 


1.37 


1.39 


1.39 


SDm 




1.29 


2.95 


0.63 


1.29 


1.29 



confirming the oracle properties established in Sections 3 and 4. For 100 sim- 
ulations, it takes about 1 second, 30 seconds and 4 minutes per simulation 
for sample sizes N = 100, 400, 900, respectively. 

Based on these simulation results, it may be tempting to consider using 
OSEAiti to select variables and then OSEaiis for parameter estimation when 
the sample size is reasonably large, as a means of saving computational 
time. We contend that this is not necessary, as our OSE or OSEt enables 
variable selection and parameter estimation simultaneously, at the similar 
computational cost. Moreover, in practice, it is not always clear how large 
a sample size at hand really is, as an effective sample size is influenced by 
factors such as the strength of spatial dependence in the error process. 
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Table 3 

The average number of correctly identified coefficients (CO), average number of 
incorrectly identified coefficients (10), mean, standard deviation (SD) and median 
estimated standard deviation (SDm) under OSE, OSEt, OSEahi, 0SEAit2 and OSEahs 

for sample size N = 900 



iVltJlIlOtl 


XI Ulll 






uaniAiti 


»jajiiAit2 


uaniAits 


CO 


3 


3.00 


3.00 


3.00 


3.00 


3.00 


10 




0.00 


0.00 


0.00 


0.00 


0.00 


/3i 


4.00 


4.00 


4.01 


4.03 


4.00 


4.00 


SD 




0.10 


0.10 


0.13 


0.10 


0.10 


SDm 




0.09 


0.09 


0.13 


0.09 


0.09 


02 


3.00 


3.01 


3.01 


2.99 


3.01 


3.01 


SD 




0.08 


0.08 


0.12 


0.08 


0.08 


SDm 




0.09 


0.09 


0.13 


0.09 


0.09 




2.00 


1.98 


1.99 


1.98 


1.98 


1.98 


SD 




0.08 


0.08 


0.11 


0.08 


0.08 


SDm 




0.09 


0.09 


0.13 


0.09 


0.09 




1.00 


1.00 


1.00 


1.01 


1.00 


1.00 


SD 




0.09 


0.09 


0.13 


0.09 


0.09 


SDm 




0.09 


0.09 


0.13 


0.09 


0.09 


r 


1.00 


0.94 


1.44 




0.94 


0.94 


SD 




0.17 


0.50 




0.17 


0.17 


SDm 




0.17 


0.40 




0.17 


0.17 


c 


0.20 


0.19 


0.25 




0.19 


0.19 


SD 




0.04 


0.04 




0.04 


0.04 


SDm 




0.04 


0.04 




0.04 


0.04 




9.00 


8.80 


8.50 


8.80 


8.80 


8.80 


SD 




0.90 


0.74 


0.87 


0.90 


0.90 


SDm 




0.85 


1.15 


0.42 


0.85 


0.85 



5.2. Data examples. The first data example consists of January precipi- 
tation (inches per 24-hour period) on the log scale from 259 weather stations 
in the state of Colorado [15]. Candidate covariates are elevation, slope, aspect 
and seven spectral bands from a MODIS satellite imagery (BIM through 
B7M). It is of interest to investigate the relationship between precipitation 
and these covariates. 

We first fit a spatial linear model with an exponential covariance function 
via maximum likelihood. The parameter estimates and their standard errors 
in Table 4 suggest that the regression coefficients for elevation, BIM, B4M, 
B6M and B7M are possibly significant. Among the covariance function pa- 
rameters, of most interest is the range parameter, which is significantly dif- 
ferent from zero. This indicates that there is spatial autocorrelation among 
the errors in the linear regression. Our OSE method selects elevation and 
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Table 4 

Regression coefficient estimates and standard deviations (SD) using maximum likelihood 
(MLE) and one-step sparse estimation (OSE) under a spatial linear model with an 
exponential covariance function for the Gaussian error process, as well as OSE under 
a standard linear model with i.i.d. errors (OSEahi) 



Terms 


MLE 


SD 


OSE 


SD 


OSEAiti 


SD 






Ref 


rression coefficients 








Elevation 


0.305 


0.055 


0.228 


0.054 


0.195 


0.044 


Slope 


0.016 


0.026 






0.035 


0.040 


Aspect 


-0.004 


0.022 






0.032 


0.034 


BIM 


0.214 


0.157 










B2M 


0.058 


0.064 










B3M 


0.017 


0.109 










B4M 


-0.404 


0.183 


-0.089 


0.034 


-0.264 


0.045 


B5M 


0.043 


0.089 










B6M 


-0.162 


0.116 










B7M 


0.172 


0.098 














Covariance function parameters 






Range 


0.967 


0.368 


1.043 


0.417 






Nugget 


0.183 


0.061 


0.196 


0.064 








0.287 


0.067 


0.304 


0.074 


0.289 


0.026 



B4M, and shrinks all the other regression coefficients to zero. The covari- 
ance function parameter estimates are close to the MLE. For comparison, 
we fit a standard linear regression with i.i.d. errors and the corresponding 
OSEAiti selects slope and aspect in addition to elevation and B4M. How- 
ever, the regression coefficients for slope and aspect do not appear to be 
significant. 

In addition, we apply our method to the whiptail lizard data as described 
in Section 1. There are 148 sites, and the response variable is the abundance 
of lizards at each site. There are 26 covariates regarding location, vegeta- 
tion, flora, soil and ants. Hoeting et al. [10] considered only 6 covariates 
after a separate prescreening procedure, and selected 2 covariates in their 
final model. In this paper, we consider all 26 covariates simultaneously, and 
interestingly reach the same final model. For details of the results, see Sec- 
tion 6.2 in Chu et al. [1]. 

APPENDIX: TECHNICAL DETAILS 

For^ease of notation, we suppress n in "ifcfc', "ofcfc', "r, I„, A„, "77, "/3 
and "0. The detailed proofs of all lemmas and theorems are given in Chu et 
al. [1]. 
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A.l. Asymptotic properties of PMLE and OSE. 

Lemma 1. Under (A.1)-(A.7), for any given ri £W x i}, we have, as 
n oo, 

N-'/H'iv) ^ N{0,3{v)), N-H"{rt) A -J(r7), 
where J(r7) = diag{J(/3), J(0)}. 

Remark. Lemma 1 establishes the asymptotic behavior of the first- 
order and the second-order derivatives of the log-likelihood function l{r}), 

scaled by Nn ^^"^ and A^^^, respectively. In addition, by Theorem 2 of Mar- 
dia and Marshall [14], /)mle is consistent and asymptotically normal with 

II^MLE -^oll = OpiNn^'"^) and iV^^T^MLE - ^o) -^^(0> J('7o)"^)- More- 
over, for a random vector r)* , such that \\\{riY^'^{ri* — ?7)|| = Op(l), by The- 

orem 2 of Mardia and Marshah [14], we have N~^i"{r]*) — —^iv)- These 
results will be used repeatedly in the proof of Theorems 3.1 and 3.2. 

Proof of Theorem 3.1. The proof follows from Lemma 1 and argu- 
ments extended from Theorems 1 and 2 in Fan and Li [7]. See details in Chu 
et al. [1]. □ 

Proof of Theorem 3.2. The proof follows from Lemma 1 and argu- 
ments extended from Theorem 5 in Zou and Li [24]. See details in Chu et 
al. [1]. □ 

A.2. Asymptotic properties of PMLEt and OSEt- Let |^| denote the 
cardinality of a discrete set A. Let /ii,T < • • • < f^N„,T denote the eigenvalues 
of tapered covariance matrix Ft- Let /i^rp denote the eigenvalues of F^ t 

such that I/if T I — ' ' ' — l/^7V„ tI let ^^'^ denote the eigenvalues of F^fc/ x 
such that I/if X I ^ ' ■ ' ^ tI- ^ matrix A, we let /imin(A) denote the 
minimum eigenvalue of A. Also, recall that tfcfc',T = tr(F^^Ffc^TFx"^Ffc/^x)- 

Lemma 2. Under (A.12)-(A.13), we have: 

(i) ||F-FT||oo = 0(iV„"'/'); (ii) |lFfc-Ffc,T||oo = 0(iV~'/'); 

(iii) \\Tkk'-Tkk'Aoo = 0{N~^l^). 

Remark. Lemma 2 establishes that the order of the difference between 

— 1/2 

the covariance matrix F and the tapered covariance matrix Ft is Nn , 
as well as that of the first-order and the second-order derivatives of the 
covariance matrices. These results are used when establishing Lemma 3. 
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Lemma 3. Under (A.1)-(A.4), (A.6) and (A.12), (A.13), we have: 

(C.l) lim„_^oom„,T = C < oo,lim„_>oo|^^V„,Tl = Ck< oo,lim„^oo|/^^C,Tl = 
Ckk' < oo for any k,k' = 1, . . . ,q. 

(C.2) For k = l,...,q, \\Tk,T\\F^ = 0{Nn^^^~^), for some 6 > 0. 
(C.3) llr^i. <Co<oo. 

(C.4) For any k,k' = 1, . . . ,q, akk',T = lim{ifcfc',T(ifcfc,Tifc'fc',T)~^^^} exists 
and is equal to Okk' = lim{ifcfc'(*fcfc*fc'fc')^"^^^}- That is, At = [auk' ,1)1^1,' =i = 
A = {akk'Yh k'=i ^'^^ nonsingular. 

Remark. Conditions (C.1)-(C.4) are the covariance tapering counter- 
parts of (A.2), (A.3)(i), (A.4)(i) and (A.6). Together with (A.5), they yield 
Proposition 4.1. In fact, Lemmas 2 and 3 hold for other tapering functions 
such as truncated polynomial functions of d/uj with constant term equal to 1 
when d <LJ, and otherwise [22]. Furthermore, (A.12) can be weakened to 
< mfn{uJnNn ^^^"*~^} < sup„{a;.„A'n ^^^'^^l < oo, with T < min{l/2,5}. 

Lemma 4. Under (A.1)-(A.7) and (A.12)-(A.13), for any given rj G 
X $7, we have 

N~'/^£'^{rj)^N{0,J{ri)) and N-'i'^iri) ^ -3{ri), 
where recall that J{r]) = diag{J(/3), J(0)}. 

Remark. Lemma 4 establishes the asymptotic behavior of the first- 
order and the second-order derivatives of the covariance-tapered log-likeli- 
hood function -£t(^)- The rates of convergence and the limiting distributions 
are the same as those for the log-likelihood function. As in Lemma 1, it 
follows that MLEt ^7mlEt consistent and asymptotically normal, as is 
given in Proposition 4.1. These results will be used to establish Theorems 4.2 
and 4.3 and play the same role as Lemma 1 when showing Theorems 3.1 
and 3.2. 

Proof of Proposition 4.1. From Lemma 3, (C.1)-(C.4) are satisfied. 
Together with (A.5), the regularity conditions of Theorem 2 of Mardia and 
Marshall [14] hold. Thus, the result in Proposition 4.1 follows. □ 

Proof of Theorem 4.2. The proof of Theorem 4.2 is similar to that of 
Theorem 3.1. The main differences are that the parameter estimates ^pmlej 
log-likelihood function i{ri) and penalized log-likelihood Q{ri) are replaced 
with their covariance-tapered counterparts ?7pmlEt' ^t(^) and Qt^t]), re- 
spectively. Furthermore, we replace the results from Lemma 1 with those 
from Lemma 4, which holds due to Lemmas 2 and 3 under the additional 
assumptions (A.12) and (A.13). □ 
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Proof of Theorem 4.3. The proof of Theorem 4.3 is similar to that 
of Theorem 3.2, but we replace the parameter estimates t/qsej log-likelihood 
function £{r]) and Q*{(3) with their covariance-tapered counterparts t7osEt' 
^t(^) and respectively. As before, we replace the results from Lem- 

ma 1 with those from Lemma 4, where the additional conditions (A. 12) 
and (A. 13) are assumed and Lemmas 2 and 3 are applied. □ 
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